Where do most of the players in FIFA 2018 come from? Is it South America or Europe? What is the most common age of the players listed in FIFA 2018? What is the age range of players? What is the distribution of their performance? These are the questions I would like to find an answer for through Exploratory Data Analysis. I will make use of the ggplot2 library that I learnt in the lesson coupled with plotly for interactive visualization.
The dataset features every player in Fifa 2018 with 70+ attributes. It contains personal attributes like Nationality, Photo, Club Age, Wage, Salary etc. I downloaded dataset from https://www.kaggle.com/thec03u5/fifa-18-demo-player-dataset.
Dataset is tidy except for a few columns like the Wage, Value and Preferred.Positions. I would extract the numeric values from Wage and Value columns, and pull out the most preferred position from the Preferred.Positions column with the assumption the position are in order of preference.
## Name Age
## J. RodrÃÂguez: 7 Min. :16.00
## J. Valencia : 7 1st Qu.:21.00
## J. Williams : 7 Median :25.00
## D. González : 6 Mean :25.14
## Danilo : 6 3rd Qu.:28.00
## Felipe : 6 Max. :47.00
## (Other) :17942
## Photo
## https://cdn.sofifa.org/48/18/players/197083.png: 2
## https://cdn.sofifa.org/48/18/players/198113.png: 2
## https://cdn.sofifa.org/48/18/players/198140.png: 2
## https://cdn.sofifa.org/48/18/players/198329.png: 2
## https://cdn.sofifa.org/48/18/players/198584.png: 2
## https://cdn.sofifa.org/48/18/players/198614.png: 2
## (Other) :17969
## Nationality Overall Potential
## Length:17981 Min. :46.00 Min. :46.00
## Class :character 1st Qu.:62.00 1st Qu.:67.00
## Mode :character Median :66.00 Median :71.00
## Mean :66.25 Mean :71.19
## 3rd Qu.:71.00 3rd Qu.:75.00
## Max. :94.00 Max. :94.00
##
## Club Value Wage
## : 248 Length:17981 Length:17981
## Villarreal CF : 35 Class :character Class :character
## Borussia Dortmund: 34 Mode :character Mode :character
## FC Nantes : 34
## Manchester United: 34
## OGC Nice : 34
## (Other) :17562
## Preferred.Positions Continent
## Length:17981 Length:17981
## Class :character Class :character
## Mode :character Mode :character
##
##
##
##
Age ranges from 16 to 47 years with a mean of 25.14 and a median 25. I am thinking of a normal distribution of Age. I would plot a histogram in univariate plots section to see if this is the case.
Looking at the Nationality column. Top 5 countries are all from from either Europe or South America. In the univariate plot section I would perform a group by operation by Nationality and plot on a map to visualize the distribution of players by country.
The Overall and Potential columns both range from 46 to 94 with mean 66 and 71 respectively. the 5 point difference in mean makes me wonder how many players have scope of improvement. I would like to explore difference of the two columns in the plot section below.I expect these two columns to be heavily correlated.
No surprises here either most players have an Overall score of 66.
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 10.00 14.00 21.00 31.95 36.00 565.00 12727
The wage variable has a lot of NAs. I will discard this variable from any further analysis.
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 10 300 625 2252 1600 123000 1548
The value variable is very intriguing. Median value is 625K, meaning half the players are valued less than 625K and half are more than 625K. The 3rd quartile is 1.6M and the maximum value is 123M. Infact I expected such observation, because most players are not valued in the millions but I would like to explore further about the high valued plyers.
## # A tibble: 6 x 11
## Nationality mean_Overall max_Overall mean_Potential max_Potential
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 United Kingdom 63.08509 89 69.93548 90
## 2 Germany 65.90088 92 71.57982 92
## 3 Spain 69.93916 90 74.78214 92
## 4 France 67.28630 88 73.02454 94
## 5 Argentina 67.76580 93 72.45907 93
## 6 Brazil 70.89532 92 72.86700 94
## # ... with 6 more variables: mean_Age <dbl>, mean_Diff <dbl>,
## # max_Diff <dbl>, mean_Value <dbl>, max_Value <dbl>, n <int>
Clearly the most redder regions are in South America and Europe. UK has the highest number of players.Most of Asia and Africa are grey in color, meaning less than 60 players are from these regions. In the middle East, there is a stark contrast between Nations, Saudi Arabia is much redder than other nations. Surprising observations are from Canada and New Zealand, both are high income countries but are grey in color, perhaps population impacts the number of players from a country.
After exploring dataset for various variables. I have following conclusions:
The dataset is tidy. Apart from a few changes like extracting numbers from a variable, I don’t need to make any more changes.
Most interesting features in the dataset are Nationality, Age, Potential, Overall, Value, and Preferred.Positions. A brief description of the features is as follows:
Other features like continent might be helpful, I would explore if it is.
I created a variable PO_Diff, which accounts for difference in Potential and Overall. I also created a variable Continent.
I extracted numerical value from Wage and Value variable. Further I pulled out most preferred position from Preferred.Position variable.
From above curve I see meaningful correlation between the following:
Normally one would expect value of a player to rise with potential and looking at correlation it does appear so. However, there are players with potential above 90 and value only 975K. Maybe the player preferred position has an impact on salary.
I have made following observations :
Looking at the plots above I see that defensive positions like LB and RB are valued less. Forward and Strike positions are worth more. Here is a link for description of positions https://en.wikipedia.org/wiki/Association_football_positions.
The following relationships have been observed:
——
I chose this plot because it clearly shows how the players in FIFA 2018 are distributed by country. Most of the players are from South America and Europe. UK has the highest number of players featured.
Even though I expected that a player would be worth less in the initial years of his career, and then stabilize in his prime years and eventually going down in worth because of his age. The curve reaffirms my intuition. It shows that for younger ages (16-22) the players worth rises with age, it then becomes stable and eventually from starts a downward curve (age 33). Maybe a younger player gains popularity and thus increases his worth or he improves his overall score and thus becoming more valuable. A good example is UK, UK has the youngest mean age and highest mean potential.
I am choosing the above plot as the final descriptive plot because it completes the story. While a players overall score is a good indicator of his worth, his preferred position impacts his worth immensely. Forward playing players are more likely to be worth more at the same overall score than Middle or Backward playing players.
Overall I selected important columns that would allow me to form insights about characteristics of players featured in FIFA 2018.
Most players featured in FIFA 2018 are from South America and Europe. Most of them are clustered around 25 years of age. And finally most of them have their overall performance score as 66. Interestingly most players are at their best. Younger player have a better chance of improving. A players value is affected by his preferred position. These are the conclusions that I have made after exploratory analysis of the dataset of FIFA 2018.
The dataset is limited as it only pertains to data of FIFA 2018. I would have loved to explore evolution of players overall performance feature and value feature over a period of time. The wage column was mostly missing. I could not form any meaningful insights through it.
The analysis that I have performed can be extended further to produce a best squad with budget. It could also be extended to address questions such as if a 2-3-5 (pyramid) formation is better than a 4-2-4 formation for the squad. Or if performance of team would improve if the cclub invests in a new player.